Online Acquisition of Japanese Unknown Morphemes using Morphological Constraints
نویسندگان
چکیده
We propose a novel lexicon acquirer that works in concert with the morphological analyzer and has the ability to run in online mode. Every time a sentence is analyzed, it detects unknown morphemes, enumerates candidates and selects the best candidates by comparing multiple examples kept in the storage. When a morpheme is unambiguously selected, the lexicon acquirer updates the dictionary of the analyzer, and it will be used in subsequent analysis. We use the constraints of Japanese morphology and effectively reduce the number of examples required to acquire a morpheme. Experiments show that unknownmorphemes were acquired with high accuracy and improved the quality of morphological analysis.
منابع مشابه
Online Japanese Unknown Morpheme Detection using Orthographic Variation
To solve the unknown morpheme problem in Japanese morphological analysis, we previously proposed a novel framework of online unknown morpheme acquisition and its implementation. This framework poses a previously unexplored problem, online unknown morpheme detection. Online unknown morpheme detection is a task of finding morphemes in each sentence that are not listed in a given lexicon. Unlike i...
متن کاملSemantic Classification of Automatically Acquired Nouns using Lexico-Syntactic Clues
In this paper, we present a two-stage approach to acquire Japanese unknown morphemes from text with full POS tags assigned to them. We first acquire unknown morphemes only making a morphologylevel distinction, and then apply semantic classification to acquired nouns. One advantage of this approach is that, at the second stage, we can exploit syntactic clues in addition to morphological ones bec...
متن کاملAccuracy Order of Grammatical Morphemes in Persian EFL Learners: Evidence for and against UG
This study addresses the acquisition of the morphological markers in Persian learners of English as a foreign language. To this end, the accuracy order of nine morphemes including plural –s, progressive –ing, copula be, auxiliary be, irregular past tense, regular past tense –ed, third person –s, possessive -ʼs and indefinite articles was studied in 6...
متن کاملComposition and Decomposition of Japanese Katakana and Kanji Morphemes for Decision Rule Induction from Patent Documents
We propose a new method to construct a word list for rule induction from Japanese patent documents. For word segmentation in Japanese, statistical morphological analyzers have been used in many applications. However, the output of these morphological analyzers presents defects when analyzing unknown words, specifically words that contain Kanji/Katakana morphemes. Some words are overly segmented...
متن کاملGeneralized unknown morpheme guessing for hybrid POS tagging of Korean
Most of errors in Korean morphological analysis and POS (Part-of-Speech) tagging are caused by unknown morphemes. This paper presents a generalized unknown morpheme handling method with P OSTAG (POStech TAGger) which is a statistical/rule based hybrid POS tagging system. The generalized unknown morpheme guessing is based on a combination of a morpheme pattern dictionary which encodes general le...
متن کامل